Data Partitioning for Incremental Data Mining

نویسندگان

  • Nittaya Kerdprasop
  • Kittisak Kerdprasop
چکیده

Data repositories of interest in data mining applications can be very large. Many of the existing learning algorithms do not scale up to extremely large data set. One approach to deal with this problem is to apply the concept of incremental learning. However, incremental data mining is not the same as incremental machine learning. The former handles one subset of data at a time, whereas the latter handles a single data instance at a time. The size of data subset determines both the performance and speed of the mining process. We thus focus the study on the partitioning of a data into a proper subset and propose an algorithm to return a data subset for both classification and association mining tasks. We also perform a set of experiments to observe the behavior of classification and association data mining on various data partitioning. The experimental results confirm our criteria on data partitioning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Analytical Comparison of Some Traditional Partitioning based and Incremental Partitioning based Clustering Methods

Data clustering is a highly valuable field of computational statistics and data mining. Data clustering can be considered as the most important unsupervised learning technique as it deals with finding a structure in a collection of unlabeled data. A Clustering is division of data into similar objects. A major difficulty in the design of data clustering algorithms is that, in majority of applica...

متن کامل

D*: A Data Storage and Retrieval System for Scientific Studies

D* is a novel system for data storage and retrieval appropriate for advanced scientific studies, as in high-energy physics, environmental sciences, and astronomy. The design of the D* system is based on certain principles of organizing and accessing multi-dimensional data on storage, whose pursuit requires that the storage system acquire a greater knowledge about the data. This provides a basis...

متن کامل

Static Analysis of Software Systems

This research addresses the design and development of an incremental software architecture recovery and evaluation environment using data mining techniques. The environment is interactive and provides: pattern-based architectural recovery using a query language and approximate graph pattern matching; optimization clustering; partitioning; and view-based architectural design evaluation. These te...

متن کامل

Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study

Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003